Cross-Validation and Mean-Square Stability
نویسندگان
چکیده
k-fold cross validation is a popular practical method to get a good estimate of the error rate of a learning algorithm. Here, the set of examples is first partitioned into k equal-sized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other k − 1 folds. The average error across the k hypotheses is used as an estimate of the error rate. Although widely used, especially with small values of k (such as 10), the technique has heretofore resisted theoretical analysis. With only sanity-check bounds known, there is not a compelling reason to use the k-fold cross-validation estimate over a simpler holdout estimate. The complications stem from the fact that the k distinct estimates have intricate correlations between them. Conventional wisdom is that the averaging in cross-validation leads to a tighter concentration of the estimate of the error around its mean. In this paper, we show that the conventional wisdom is essentially correct. We analyze the reduction in variance of the gap between the cross-validation estimate and the true error rate, and show that for a large family of stable algorithms, cross-validation achieves a near optimal variance reduction factor of (1+o(1))/k. In these cases the k different estimates are essentially acting independent of each other. To proceed with the analysis, we define a new measure of algorithm stability, called mean-square stability. Meansquare stability is weaker than most stability notions described in the literature, and encompasses a large class of algorithms, namely bounded SVM regression and regularized least-squares regression, among others. For slightly less stable algorithms, such as t-Nearest-Neighbor, we show that cross validation leads to an O(1/k) reduction in the variance of the generalization error.
منابع مشابه
Rapid Predictive Models for Minimally Destructive Kappa Number and Pulp Yield of Acacia Spp. with near Infrared Reflectance (nir) Spectroscopy
Kraft pulp and wood powder from Acacia spp. were selected for the development of rapid, minimally-destructive, and environmentally friendly predictions of kappa number and pulp yield, by means of near infrared reflectance (NIR) spectra. The models, based on Partial Least Squares Regression (PLS-R), were established with fifty-four calibration samples selected by Principle Component Analysis (PC...
متن کاملDesign and Experiment of NIR Wheat Quality Quick Detection System
In this paper, NIR wheat quality quick detection system (NIR-WQDS) was developed on the base of grating technology with a scanning range of 900-1700nm.46 wheat samples were analyzed to compare performance of NIR-WQDS and MPA FT-NIR spectroscopy (MPA). Experiment results of NIR-WQDS show that the coefficient of determination R is 94.44%, root mean square error of cross validation RMSECV is 0.346...
متن کاملارائه مدلی جدید برای پیشبینی دمای تجزیه حرارتی ترکیبات نیتروآروماتیک پرانرژی
Abstract In this work the new simple model proposed for predicting the thermal decomposition temperature of energetic nitroaromatic compounds. The results are shown that the optimum elemental composition and several structural parameters have the most effects on this model which is derived by multiple Linear Regression (MLR) approach. The determination coefficient of the model is 0.940 for 29 ...
متن کاملFast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0
MOTIVATION The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge number of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting f...
متن کاملFactor Structure of the Smoking Temptation Scale: Cross-Validation in Iranian men
Background: The transtheoretical model (TTM) is used as a framework to implement smoking cessation programs. This model has some subscales based on which the smoking temptation scale is proposed as stages movement factor. This study aimed to translate and validate the temptation subscales of the TTM questionnaire in the Iranian population. Methods...
متن کامل